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Abstract.  Forms  of  synchrony  can  greatly  simplify  modeling,  design,  and  ver¬ 
ification  of  distributed  systems.  Thus,  recent  advances  in  clock  synchronization 
protocols  and  their  adoption  hold  promise  for  system  design.  However,  these  pro¬ 
tocols  synchronize  the  distributed  clocks  only  within  a  certain  tolerance,  and  there 
are  transient  phases  while  synchronization  is  still  being  achieved.  Abstractions 
used  for  modeling  and  verification  of  such  systems  should  accurately  capture 
these  imperfections  that  cause  the  system  to  only  be  “almost  synchronized.”  In 
this  paper,  we  present  approximate  synchrony,  a  sound  and  tunable  abstraction 
for  verification  of  almost- synchronous  systems.  We  show  how  approximate  syn¬ 
chrony  can  be  used  for  verification  of  both  time  synchronization  protocols  and 
applications  running  on  top  of  them.  We  provide  an  algorithmic  approach  for 
constructing  this  abstraction  for  symmetric,  almost- synchronous  systems,  a  sub¬ 
class  of  almost- synchronous  systems.  Moreover,  we  show  how  approximate  syn¬ 
chrony  also  provides  a  useful  strategy  to  guide  state-space  exploration.  We  have 
implemented  approximate  synchrony  as  a  part  of  a  model  checker  and  used  it  to 
verify  models  of  the  Best  Master  Clock  (BMC)  algorithm,  the  core  component  of 
the  IEEE  1588  precision  time  protocol,  as  well  as  the  time- synchronized  channel 
hopping  protocol  that  is  part  of  the  IEEE  802.15.4e  standard. 


1  Introduction 

Forms  of  synchrony  can  greatly  simplify  modeling,  design,  and  verification  of  dis¬ 
tributed  systems.  Traditionally,  a  common  sense  of  time  is  established  using  time- 
synchronization  (clock- synchronization)  protocols  or  systems  such  as  the  global  posi¬ 
tioning  system  (GPS),  network  time  protocol  (NTP),  and  the  IEEE  1588  [19]  precision 
time  protocol  (PTP).  These  protocols,  however,  synchronize  the  distributed  clocks  only 
within  a  certain  bound.  In  other  words,  at  any  time  point,  clocks  of  different  nodes  can 
have  differing  values,  but  time  synchronization  ensures  that  those  values  are  within  a 
specified  offset  of  each  other,  i.e.,  they  are  almost  synchronized. 

Distributed  protocols  running  on  top  of  time- synchronized  nodes  are  designed  un¬ 
der  the  assumption  that  while  processes  at  different  nodes  make  independent  progress, 
no  process  falls  very  far  behind  any  other.  Figure  1  provides  examples  of  such  real 
world  systems.  For  example,  Google  Spanner  [8]  is  a  distributed  fault  tolerant  system 
that  provides  consistency  guarantees  when  run  on  top  of  nodes  that  are  synchronized 
using  GPS  and  atomic  clocks,  wireless  sensor  networks  [27,26]  use  time  synchronized 
channel  hopping  (TSCH)  [1]  as  a  standard  for  time  synchronization  of  sensor  nodes  in 
the  network,  and  IEEE  1588  precision  time  protocol  (PTP)  [19]  has  been  adopted  in 
industrial  automation,  scientific  measurement  [21],  and  telecommunication  networks. 


Correctness  of  these  protocols  depends  on  having  some  synchrony  between  different 
processes  or  nodes. 

When  modeling  and  verifying  sys¬ 
tems  that  are  almost- synchronous  it 
is  important  to  compose  them  using 
the  right  concurrency  model.  One  re¬ 
quires  a  model  that  lies  somewhere  be¬ 
tween  completely  synchronous  (lock- 
step  progress)  and  completely  asyn¬ 
chronous  (unbounded  delay).  Various 
such  concurrency  models  have  been 
proposed  in  the  literature,  including 
quasi- synchrony  [7,17]  and  bounded- 
asynchrony  [15].  However,  we  discuss  in  Fig.  1.  Almost- synchronous  systems  comprise 
Sec.  7,  these  models  permit  behaviors  an  application  protocol  running  on  top  of  a  time- 
that  are  typically  disallowed  in  almost-  synchronization  layer, 
synchronous  systems.  Alternatively,  one  can  use  formalisms  for  hybrid  or  timed  sys¬ 
tems  that  explicitly  model  clocks  (e.g.,  [3,2]),  but  the  associated  methods  (e.g.,  [20,16]) 
tend  to  be  less  efficient  for  systems  with  a  huge  discrete  state  space,  which  is  typical  for 
distributed  software  systems. 

In  this  paper,  we  introduce  symmetric,  almost- synchronous  (SAS)  systems,  a  class 
of  distributed  systems  in  which  processes  have  symmetric  timing  behavior.  In  our  ex¬ 
perience,  protocols  at  both  the  application  layer  and  the  time- synchronization  layer  can 
be  modeled  as  SAS  systems.  Additionally,  we  introduce  the  notion  of  approximate  syn¬ 
chrony  (AS)  as  a  concurrency  model  for  almost-synchronous  systems,  which  also  en¬ 
ables  one  to  compute  a  sound  discrete  abstraction  of  a  SAS  system.  Intuitively,  a  system 
is  approximately-synchronous  if  the  number  of  steps  taken  by  any  two  processes  do  not 
differ  by  more  than  a  specified  bound,  denoted  A.  The  presence  of  the  parameter  A 
makes  approximate  synchrony  a  tunable  abstraction  method.  We  demonstrate  three 
different  uses  of  the  approximate  synchrony  abstraction: 

1.  Verifying  time-synchronized  systems:  Suppose  that  the  system  to  be  verified  runs 
on  top  of  a  layer  that  guarantees  time  synchronization  throughout  its  execution.  In 
this  case,  we  show  that  there  is  a  sound  value  of  A  which  can  be  computed  using  a 
closed  form  equation  as  described  in  Sec.  3.2. 

2.  Verifying  systems  with  recurrent  logical  behavior:  Suppose  the  system  to  be  ver¬ 
ified  does  not  rely  on  time  synchronization,  but  its  traces  contain  recurrent  logical 
conditions  —  a  set  of  global  states  that  are  visited  repeatedly  during  the  protocol’s 
operation.  We  show  that  an  iterative  approach  based  on  model  checking  can  identify 
such  recurrent  behavior  and  extract  a  value  of  A  that  can  be  used  to  compute  a  sound 
discrete  abstraction  for  model  checking  (see  Sec.  4).  Protocols  verifiable  with  this 
approach  include  some  at  the  time- synchronization  layer,  such  as  IEEE  1588  [19]. 

3.  Prioritizing  state-space  exploration:  The  approximate  synchrony  abstraction  can 
also  be  used  as  a  search  prioritization  technique  for  model  checking.  We  show  in 
Sec.  6  that  in  most  cases  it  is  more  efficient  to  search  behaviors  for  smaller  value  of 
A  (“more  synchronous”  behaviors)  first  for  finding  bugs. 

We  present  two  practical  case  studies:  (i)  a  time- synchronized  channel  hopping 
(TSCH)  protocol  that  is  part  of  the  IEEE802.15.4e  [1]  standard,  and  (ii)  the  best  mas¬ 
ter  clock  (BMC)  algorithm  of  the  IEEE  1588  precision  time  protocol.  The  former  is 
system  where  the  nodes  are  time-synchronized,  while  the  latter  is  the  case  of  a  system 
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with  recurrent  logical  behavior.  Our  results  show  that  approximate  synchrony  can  re¬ 
duce  the  state  space  to  be  explored  by  orders  of  magnitude  while  modeling  relevant 
timing  semantics  of  these  protocols,  allowing  one  to  verify  properties  that  cannot  be 
verified  otherwise.  Moreover,  we  were  able  to  find  a  so-called  “rogue  frame”  scenario 
that  the  IEEE  1588  standards  committee  had  long  debated  without  resolution  (see  our 
companion  paper  written  for  the  IEEE  1588  community  [6]  for  details). 

Our  abstraction  technique  can  be  used  with  any  finite-state  model  checker.  In  this 
paper  we  implement  it  on  top  of  the  ZlNG  model  checker  [4],  due  to  its  ability  to  control 
the  model  checker’s  search  using  an  external  scheduler  that  enforces  the  approximate 
synchrony  condition. 

To  summarize,  this  paper  makes  the  following  contributions: 

-  The  formalism  of  symmetric,  almost  synchronous  (SAS)  systems  and  its  use  in  mod¬ 
eling  an  important  class  of  distributed  systems  (Sec.  2); 

-  A  tunable  abstraction  technique,  termed  approximate  synchrony  (Sec.  2  and  3); 

-  Automatic  procedures  to  derive  values  of  A  for  sound  verification  (Sec.  3  and  4); 

-  An  implementation  of  approximate  synchrony  in  an  explicit- state  model  checker 
(Sec.  5),  and 

-  The  use  of  approximate  synchrony  for  verification  and  systematic  testing  of  two  real- 
world  protocols,  the  BMC  algorithm  (a  key  component  of  the  IEEE  1588  standard), 
and  the  time  synchronized  channel  hopping  protocol  (Sec.  6). 


2  Formal  Model  and  Approach 

In  this  section,  we  define  clock  synchronization  precisely  and  formalize  the  notion  of 
symmetric  almost- synchronous  (SAS)  systems,  the  class  of  distributed  systems  we  are 
concerned  with  in  this  paper. 

2.1  Clocks  and  Synchronization 

Each  node  in  the  distributed  system  has  an  associated  (local)  physical  clock  which 
takes  a  non-negative  real  value.  For  purposes  of  modeling  and  analysis,  we  will  also 
assume  the  presence  of  an  ideal  (global)  reference  clock,  denoted  t.  The  notation  x(t) 
denotes  the  value  of  x  when  the  reference  clock  has  value  t.  Given  this  notation,  we 
describe  the  following  two  basic  concepts: 

1.  Clock  Skew:  The  skew  between  two  clocks  Xi  and  Xj  at  time  t  (according  to  the 
reference  clock)  is  the  difference  in  their  values  \xi(t)  ~  Xj(f)\. 

2.  Clock  Drift:  The  drift  in  the  rate  of  a  clock  x  is  the  difference  per  unit  time  of  the 
value  of  x  from  the  ideal  reference  clock  t. 

Time  synchronization  ensures  that  the  skew  between  any  two  physical  clocks  in  the 
network  is  bounded.  The  formal  definition  is  as  below. 

Definition  1.  A  distributed  system  is  time- synchronized  ( or  clock- synchronized)  if  there 
exists  a  parameter  (3  such  that  for  every  pair  of  nodes  i  and  j  and  for  any  t, 

\Xi(t)  <  P  (1) 

For  ease  of  exposition,  we  will  not  explicitly  model  the  details  of  dynamics  of  physical 
clocks  or  the  updates  to  them.  We  will  instead  abstract  the  clock  dynamics  as  compris¬ 
ing  arbitrary  updates  to  Xi  variables  subject  to  additional  constraints  on  them  such  as 
Eqn.  1  (wherever  such  assumptions  are  imposed). 


Example  1.  The  IEEE  1588  precision  time  protocol  [19]  can  be  implemented  so  as  to 
bound  the  physical  clock  skew  to  the  order  of  sub-nanoseconds  [21],  and  the  typical 
clock  drift  to  at  most  10-4  [19]. 


2.2  Symmetric,  Almost-Synchronous  Systems 

We  model  the  distributed  system  as  a  collection  of  processes,  where  processes  are  used 
to  model  both  the  behavior  of  nodes  as  well  as  of  communication  channels.  There  can 
be  one  or  more  processes  executing  at  a  node. 

Formally,  the  system  is  modeled  as  the  tuple  Me  =  (<S,  5,  X,  ID,  %,  t)  where 

-  S  is  the  set  of  discrete  states  of  the  system, 

-  S  C  S  x  S  is  the  transition  relation  for  the  system, 

-  1  C  S  is  the  set  of  initial  states, 

-  ID  =  {1,  2, . . . ,  K}  is  the  set  of  process  identifiers, 

-  X  —  (xi?  X2,  •  •  • ,  Xk)  is  a  vector  of  local  clocks,  and 

-  t  m.  (n,  72, . . . ,  tr)  is  a  vector  of  process  timetables.  The  timetable  of  the  it h  pro¬ 
cess,  Ti,  is  an  infinite  vector  {r} ,  rf ,  rf , . . .)  specifying  the  time  instants  according 
to  local  clock  Xi  when  process  i  executes  (steps).  In  other  words,  process  i  makes  its 
j th  step  when  Xi  =  Tl  • 

For  convenience,  we  will  denote  the  it h  process  by  Vi .  Since  in  practice  the  dynamics  of 
physical  clocks  can  be  fairly  intricate,  we  choose  not  to  model  these  details  —  instead, 
we  assume  that  the  value  of  a  physical  clock  Xi  can  vary  arbitarily  subject  to  additional 
constraints  (e.g.,  Eqn.  1). 

The  kth  nominal  step  size  of  process  Vi  is  the  intended  interval  between  the  (k  —  l)th 
and  kth  steps  of  Vi,  viz.,  rf  —  r^_1.  The  actual  step  size  of  the  process  is  the  actual 
time  elapsed  between  the  (. k  —  l)th  and  kth.  step,  according  to  the  ideal  reference  clock 
t.  In  general,  the  latter  differs  from  the  former  due  to  clock  drift,  scheduling  jitter,  etc. 

Motivated  by  our  case  studies  with  the  IEEE  1588  and  802.15.4e  standards,  we 
impose  two  restrictions  on  the  class  of  systems  considered  in  this  paper: 

1 .  Common  Timetable:  For  any  two  processes  Vi  and  V3 ,  Ti  =  Tj .  Note  that  this  does 
not  mean  that  the  process  step  synchronously,  since  their  local  clocks  may  report 
different  values  at  the  same  time  t.  However,  if  the  system  is  time  synchronized, 
then  the  processes  step  “almost  synchronously.” 

2.  Bounded  Process  Step  Size:  For  any  process  Vi,  its  actual  step  size  lies  in  an  interval 
[crl,  cru].  This  interval  is  the  same  for  all  processes.  This  restriction  arises  in  practice 
from  the  bounded  drift  of  physical  clocks. 

A  set  of  processes  obeying  the  above  restrictions  is  termed  a  symmetric,  almost -synchronous 
(SAS)  system.  The  adjective  “symmetric”  refers  only  to  the  timing  behavior  —  note 
that  the  logical  behavior  of  different  processes  can  be  very  different.  Note  also  that 
SAS  systems  may  or  may  not  be  running  on  top  of  a  time  synchronization  layer,  i.e., 
SAS  systems  and  time- synchronized  systems  are  orthogonal  concepts. 

Example  2.  The  IEEE  1588  protocol  can  be  modeled  as  a  SAS  system.  All  processes  in¬ 
tend  to  step  at  regular  intervals  called  the  announce  time  interval.  The  specification  [19] 
states  the  nominal  step  size  for  all  processess  as  1  second;  thus  the  timetable  is  the  se¬ 
quence  (0, 1,  2,  3, . . .).  However,  due  to  the  drift  of  clocks  and  other  non-idealities  such 
as  jitter  due  to  OS  scheduling,  the  step  size  in  typical  IEEE  1588  implementations  can 


vary  by  ±10  3.  From  this,  the  actual  step  size  of  processes  can  be  derived  to  lie  in  the 
interval  [0.999, 1.001]. 

Traces  and  Segments.  A  timed  trace  (or  simply  trace)  of  the  SAS  system  Ale  is  a 
timestamped  record  of  the  execution  of  the  system  according  to  the  global  (ideal)  time 
reference  t.  Formally,  a  timed  trace  is  a  sequence  ho,  h\,  /12,  •  •  •  where  each  element 
hj  is  a  triple  (sj ,  Xj 5  tj )  where  Sj  G  S  is  a  discrete  (global)  state  at  time  t  =  tj  and 
Xj  =  (xij  5  X2,j ,  •  •  • ,  Xkj )  is  the  vector  of  clock  values  at  time  tj .  For  all  j,  at  least  one 
process  makes  a  step  at  time  tj ,  so  there  exists  at  least  one  i  and  a  corresponding  rrii  G 
{0, 1,  2, . . .}  such  that  Xi,j(tj)  =  rTl •  Moreover,  processes  step  according  to  their 
timetables;  thus,  if  any  Vi  makes  its  ra± h  and  Z^th  steps  at  times  tj  and  tk  respectively, 
for  rrii  <  U,  then  Xi,j(tj)  =  rTi  <  Tt  ~  Xi,k{tk)-  Also,  by  the  bounded  process 
step  size  restriction,  if  any  Vi  makes  its  mAh  and  rrii  ±  1th  steps  at  times  tj  and  tk 
respectively  (for  all  rrii),  \tk  ~  tj\  £  [crl,cru].  Finally,  so  G  X  and  S(sj,  Sj+i)  holds  for 
all  j  >  0  with  the  transition  into  s j  occuring  at  time  t  =  tj. 

A  trace  segment  is  a  (contiguous)  subsequence  hj ,  hj+ 1, . . . ,  hi  of  a  trace  of  Ale- 

2.3  Verification  Problem  and  Approach 

The  central  problem  considered  in  this  paper  is  as  follows: 

Problem  1.  Given  an  SAS  system  Me  modeled  as  above,  and  a  linear  temporal  logic 
(LTL)  property  <T>  with  propositions  over  the  discrete  states  of  Me ,  verify  whether  Me 
satisfies 

One  way  to  model  Me  would  be  as  a  hybrid  system  (due  to  the  continuous  dynam¬ 
ics  of  physical  clocks),  but  this  approach  does  not  scale  well  due  to  the  extremely  large 
discrete  state  space.  Instead,  we  provide  a  sound  discrete  abstraction  Ma  of  Ale  that 
preserves  the  relevant  timing  semantics  of  the  ‘almost-synchronous’  systems.  (Sound¬ 
ness  is  formalized  in  Sec.  3). 

There  are  two  phases  in  our  approach: 

1.  Compute  Abstraction  Parameter:  Using  parameters  of  Ale  (relating  to  clock  dy¬ 
namics),  we  compute  a  parameter  A  characterizing  the  “approximate  synchrony” 
condition,  and  use  A  to  generate  a  sound  abstract  model  Ma- 

2.  Model  Checking:  We  verify  the  temporal  logic  property  <P  on  the  abstract  model 
using  finite- state  model  checking. 

The  key  to  this  strategy  is  the  first  step,  which  is  the  focus  of  the  following  sections. 

3  Approximate  Synchrony 

We  now  formalize  the  concept  of  approximate  synchrony  (AS)  and  explain  how  it  can 
be  used  to  generate  a  discrete  abstraction  of  almost-synchronous  distributed  systems. 
Approximate  synchrony  applies  to  both  (segments  of)  traces  and  to  systems. 

Definition  2.  ( Approximate  Synchrony  for  Traces )  A  trace  (segment)  of  a  SAS  system 
Me  is  said  to  satisfy  approximate  synchrony  (is  approximately-synchronous )  with  pa¬ 
rameter  A  if  for  any  two  processes  Vi  and  Vj  in  Me >  the  number  of  steps  Ni  and  Nj 
taken  by  the  two  processes  in  that  trace  (segment)  satisfies  the  following  condition: 


\Nj  —  Nj\  <  A 


Although  this  definition  is  in  terms  of  traces  of  SAS  systems,  we  believe  the  notion  of 
approximate  synchrony  is  more  generally  applicable  to  other  distributed  systems  also. 
An  early  version  of  this  definition  appeared  in  [10]. 

The  definition  extends  to  a  SAS  system  in  the  standard  way: 


Definition  3.  ( Approximate  Synchrony  for  Systems )  A  SAS  sys¬ 
tem  Me  satisfies  approximate  synchrony  (is  approximately- 
synchronous)  with  parameter  A  if  all  traces  of  that  system  satisfy 
approximate  synchrony  with  parameter  A. 

We  refer  to  the  condition  in  Definition  3  above  as  the  approxi¬ 
mate  synchrony  (AS)  condition  with  parameter  A,  denoted  AS  (A). 
For  example,  in  Fig.  2,  executing  step  5  of  process  PI  before  step 
3  of  process  P2  violates  the  approximate  synchrony  condition  for 
A  =  2.  Note  that  A  quantifies  the  “approximation”  in  approxi¬ 
mate  synchrony.  For  example,  for  a  (perfectly)  synchronous  system 
A  =  0,  since  processes  step  at  the  same  time  instants.  For  a  fully 
asynchronous  system,  A  =  oc,  since  one  process  can  get  arbitrarily 
ahead  of  another. 


PI  P  2 

1  • 


2  • 
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Fig.  2.  AS  (A)  vi¬ 
olated  for  A  —  2 


3.1  Discrete  Approximate  Synchrony  Abstraction 

We  now  present  a  discrete  abstraction  of  a  SAS  system.  The  key  modification  is  to  (i) 
remove  the  physical  clocks  and  timetables,  and  (ii)  include  instead  an  explicit  sched¬ 
uler  that  constrains  execution  of  processes  so  as  to  satisfy  the  approximate  synchrony 
condition  AS  (A). 

Formally,  given  a  SAS  system  Me  =  (<S,  5,  X,  ID,  t),  we  construct  an  A- abstract 
model  Ma  as  the  tuple  (S,  Sa,Z,  ID,  pa)  where  pA  is  a  scheduler  process  that  per¬ 
forms  an  asynchronous  composition  of  the  processes  Vi,  V2,  •  •  • ,  Vk  while  enforcing 
AS  (A).  Conceptually,  the  scheduler  pA  maintains  state  counts  TV*  of  the  numbers  of 
steps  taken  by  each  process  Vi  from  the  initial  state.4  A  configuration  of  M  a  is  a  pair 
(s,  TV)  where  s  G  S  and  TV  G  ZK  is  the  vector  of  step  counts  of  the  processes.  The 
abstract  model  Ad  a  changes  its  configuration  according  to  its  transition  function  Sa 
where  Sa((s ,  TV),  (s' ,  TV7))  iff  (i)  5(s,  s')  and  (ii)  N[  =  Ni  + 1  if  pA  permits  Vi  to  make 
a  step  and  N[  =  N-h  otherwise. 

In  an  initial  state,  all  processes  Vi  are  enabled  to  make  a  step.  At  each  step  of  Sa , 
Pa  enforces  the  approximate  synchrony  condition  by  only  enabling  Vi  to  step  iff  that 
step  does  not  violate  AS  (A).  Behaviors  of  Ma  are  untimed  traces ,  i.e.,  sequences  of 
discrete  (global)  states  sq?  su  s2,  •  •  •  where  sj  e  S,  s 0  is  an  initial  (global)  state,  and 
each  transition  from  sj  to  Sj+\  is  consistent  with  Sa  defined  above. 

Note  that  approximate  synchrony  is  a  tunable  timing  abstraction.  Larger  the  value 
of  A,  more  conservative  the  abstraction.  The  key  question  is:  for  a  given  system,  what 
value  of  A  constitutes  a  sound  timing  abstraction,  and  how  do  we  automatically  com¬ 
pute  it?  Recall  that  one  model  is  a  sound  abstraction  of  another  if  and  only  if  every 
execution  trace  of  the  latter  (concrete  model  Me)  is  also  an  execution  trace  of  the 
former  (abstract  model  Ma)-  In  our  setting,  the  A-abstract  and  concrete  models  both 

4  The  inclusion  of  step  counts  may  seem  to  make  the  model  infinite- state.  We  will  show  in  Sec.  5 
how  the  model  checker  can  be  implemented  without  explicitly  including  the  step  counts  in  the 
state  space. 


capture  the  protocol  logic  in  an  identical  manner,  and  differ  only  in  their  timing  se¬ 
mantics.  The  concrete  model  explicitly  models  the  physical  clocks  of  each  process  as 
real- valued  variables  as  described  in  Sec.  2.  The  executions  of  this  model  can  be  rep¬ 
resented  as  timed  traces  (sequences  of  timestamped  states).  On  the  other  hand,  in  the 
Z\-abstract  model,  processes  are  interleaved  asynchronously  while  respecting  the  ap¬ 
proximate  synchrony  condition  stated  in  Definition  3.  An  execution  of  the  Z\-abstract 
model  is  an  untimed  trace  (sequences  of  states).  We  equate  timed  and  untimed  traces 
using  the  “untiming”  transformation  proposed  by  Alur  and  Dill  [3]  —  i.e.,  the  traces 
must  be  identical  with  respect  to  the  discrete  states. 


3.2  Computing  A  for  Time-Synchronized  Systems 

We  now  address  the  question  of  computing  a  value  of  A  such  that  the  resulting  Ma  is 
a  sound  abstraction  of  the  original  SAS  system  Me •  We  consider  here  the  case  when 
Me  is  a  system  running  on  a  layer  that  guarantees  time  synchronization  (Eqn.  1)  from 
the  initial  state.  A  second  case,  when  nodes  are  not  time- synchronized  and  approximate 
synchrony  only  holds  for  segments  of  the  traces  of  a  system,  is  handled  in  Sec.  4. 

Consider  a  SAS  system  in  which  the  physical  clocks  are  always  synchronized  to 
within  /3,  i.e.,  Equation  1  holds  for  all  time  t  and  /?  is  a  tight  bound  computed  based 
on  the  system  configuration.  Intuitively,  if  /3  >  0,  then  A  >  1  since  two  processes 
are  not  guaranteed  to  step  at  the  same  time  instants,  and  so  the  number  of  steps  of  two 
processes  can  be  off  by  at  least  one.  The  main  result  of  this  section  is  that  SAS  systems 
that  are  time-synchronized  are  also  approximately- synchronous,  and  the  value  of  A  is 
given  by  the  following  theorem. 

Theorem  1.  Any  SAS  system  Me  satisfying  Equation  1  is  approximately -synchronous 
with  parameter  A  =  (Proof  in  10.2) 

Suppose  the  abstract  model  Ma  is  constructed  as  described  in  Sec.  3.1  with  A  as  given 
in  Theorem  1  and  a1  is  the  lower  bound  of  the  step  size  defined  in  Sec.  2.2.  Then  as  a 
corollary,  we  can  conclude  that  AT  a  is  a  sound  abstraction  of  Me-  every  trace  of  Me 
satisfies  AS  (A)  and  hence  is  a  trace  of  Ma  after  untiming. 

Example  3.  The  Time -Synchronized  Channel  Hopping  (TSCH)  [1]  protocol  is  being 
adopted  as  a  part  of  the  low  power  Medium  Access  Control  standard  IEEE802.15.4e.  It 
can  be  modeled  as  a  SAS  system  since  it  has  a  time-slotted  architecture  where  processes 
share  the  same  timetable  for  making  steps.  The  TSCH  protocol  has  two  components: 
one  that  operates  at  the  application  layer,  and  one  that  provides  time  synchronization, 
with  the  former  relying  upon  the  latter.  We  verify  the  application  layer  of  TSCH  that 
assumes  that  nodes  in  the  system  are  always  time- synchronized  within  a  bound  called 
the  “guard  time”  which  corresponds  to  f3.  Moreover,  in  practice,  /3  is  much  smaller  than 
a1  and  thus  A  is  typically  1  for  implementations  of  the  TSCH. 

4  Systems  with  Recurrent  Logical  Conditions 

We  now  consider  the  case  of  a  SAS  system  that  does  not  execute  on  top  of  a  layer  that 
guarantees  time  synchronization  (i.e.,  Eqn.  1  may  not  hold).  We  identify  behavior  of 
certain  SAS  systems,  called  recurrent  logical  conditions ,  that  can  be  exploited  for  ab¬ 
straction  and  verification.  Specifically,  even  though  AS  (A)  may  not  hold  for  the  system 
for  any  finite  A,  it  may  still  hold  for  segments  of  every  trace  of  the  system. 


Definition  4.  ( Recurrent  Logical  Condition )  For  a  SAS  system  Me,  cl  recurrent  logical 
condition  is  a  predicate  logicConv  on  the  state  of  Me  such  that  Me  satisfies  the  LTL 
property  G  F  logicConv. 

Our  verification  approach  is  based  on  finding  a  finite  A  such  that,  for  every  trace  of 
Me  9  segments  of  the  trace  between  states  satisfying  logicConv  satisfy  AS  (A).  This 
property  of  system  traces  can  then  be  exploited  for  efficient  model  checking. 

We  begin  with  an  example  of  a  recurrent  logical  condition  case  in  the  context  of 
the  IEEE  1588  protocol  (Sec.  4.1).  We  then  present  our  verification  approach  based  on 
inferring  A  for  trace  segments  via  iterative  use  of  model  checking  (Sec.  4.2). 


4.1  Example:  IEEE  1588  protocol 

The  IEEE  1588  standard  [19],  also  known  as  the  precision  time  protocol  ( PTP ),  enables 
precise  synchronization  of  clocks  over  a  network.  The  protocol  consists  of  two  parts:  the 
best  master  clock  (BMC)  algorithm  and  a  time  synchronization  phase.  The  BMC  algo¬ 
rithm  is  a  distributed  algorithm  whose  purpose  is  two-fold:  (i)  to  elect  a  unique  grand¬ 
master  clock  that  is  the  best  clock  in  the  network,  and  (ii)  to  find  a  unique  spanning  tree 
in  the  network  with  the  grandmaster  clock  at  the  root  of  the  tree.  The  combination  of  a 
grandmaster  clock  and  a  spanning  tree  constitutes  the  global  stable  configuration  known 
as  logical  convergence  that  corresponds  to  the  recurrent  logical  condition.  The  second 
phase,  the  time  synchronization  phase,  uses  this  stable  configuration  to  synchronize  or 
correct  the  physical  clocks  (more  details  in  [19]). 
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Fig.  3.  Phases  of  the  IEEE  1588  time- synchronization  protocol 

Figure  3  gives  an  overview  of  the  phases  of  the  IEEE  1588  protocol  execution. 
The  distributed  system  starts  executing  the  first  phase  (e.g.,  the  BMC  algorithm)  from 
an  initial  configuration.  Initially,  the  clocks  are  not  guaranteed  to  be  synchronized  to 
within  a  bound  /?.  However,  once  logical  convergence  occurs,  the  clocks  are  synchro¬ 
nized  shortly  thereafter.  Once  the  clocks  have  been  synchronized,  it  is  possible  for  a 
failure  at  a  node  or  link  to  break  clock  synchronization.  The  BMC  algorithm  operates 
continually,  with  the  goal  of  ensuring  that,  if  time  synchronization  is  broken,  the  clocks 
will  be  re-synchronized.  Thus,  a  typical  1588  protocol  execution  is  structured  as  a  (po¬ 
tentially  infinite)  repetition  of  the  two  phases:  logical  convergence,  followed  by  clock 
synchronization.  We  exploit  this  recurrent  structure  to  show  in  Sec.  4.2  that  we  can 
compute  a  value  of  A  obeyed  by  segments  of  any  trace  of  the  system.  The  approach 
operates  by  iterative  model  checking  of  a  specially-crafted  temporal  logic  formula. 

Note  that  the  time  taken  by  the  protocol  to  logically  converge  depends  on  various 
factors  including  network  topology  and  clock  drift.  In  Sec.  6,  we  demonstrate  empiri- 


cally  that  the  value  of  A  depends  on  the  number  of  steps  (length  of  the  segment)  taken 
by  BMCA  to  converge  which  in  turn  depends  on  factors  mentioned  above. 


4.2  Iterative  Algorithm  to  Compute  ^-Abstraction  for  Verification 

Given  a  SAS  system  Me  whose  traces  have  a  recurrent  structure,  and  an  LTL  property 
we  present  the  following  approach  to  verify  whether  Me  satisfies 

1.  Define  recurrent  condition:  Guess  a  recurrent  logical  condition ,  logicConv ,  on  the 
global  state  of  Me- 

2.  Compute  7Vmin-’  Guess  an  initial  value  of  A,  and  compute,  from  parameters  a1,  au 
of  the  processes  in  Me s  a  number  7Vmin  such  that  the  AS  (A)  condition  is  satisfied 
on  all  trace  segments  where  no  process  makes  7Vmin  or  more  steps.  We  describe  the 
computation  of  7Vmin  in  more  detail  below. 

3.  Verify  if  A  is  sound:  Verify  using  model  checking  on  Ma  that,  every  trace  segment 
that  starts  in  an  initial  state  or  a  state  satisfying  logicConv  and  ends  in  another  state 
in  logicConv  satisfies  AS  (A).  This  is  done  by  checking  that  no  process  makes  7Vmin 
or  more  steps  in  any  such  segment.  Note  that  verifying  Ma  in  place  of  M c  is  sound 
as  AS  (A)  is  obeyed  for  up  to  7Vmin  steps  from  any  state.  Further  details,  including 
the  LTL  property  checked,  are  provided  below. 

4.  Verify  Me  using  A:  If  the  verification  in  the  preceding  step  succeeds,  then  a  model 
checker  can  verify  ^  on  a  discrete  abstraction  Ma  of  Me,  which,  similar  to  M a,  is 
obtained  by  dropping  physical  clocks  and  timetables,  and  enforcing  the  AS  (A)  con¬ 
dition  to  segments  between  visits  to  logicConv.  Formally,  MA  —  (<S,  £a,X,  ID,  Pa) 
where  Sa  differs  from  Sa  only  in  that  for  a  configuration  (s,  TV),  N[  =  0  for  all  i  if 
s'  G  logicConv  (otherwise  it  is  identical  to  5a). 

However,  if  the  verification  in  Step  3  fails,  we  go  back  to  Step  2  and  increment  A 
and  repeat  the  process  to  compute  a  sound  value  of  A. 
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Fig.  4.  Iterative  algorithm  for  computing  A  exploiting  logical  convergence 

Figure  4  depicts  this  iterative  approach  for  the  specific  case  of  the  BMC  algorithm.  We 
now  elaborate  on  Steps  2  and  3  of  the  approach. 

Step  2:  Computing  TVmin  for  a  given  A.  Recall  from  Sec.  2.2  that  the  actual  step  size 
of  a  process  lies  in  the  interval  [a1 ,  cru] .  Let  Vf  be  the  fastest  process  (the  one  that  makes 
the  most  steps  from  the  initial  state)  and  Vs  be  the  slowest  (the  fewest  steps).  Denote 
the  corresponding  number  of  steps  by  Nf  and  Ns  respectively.  Then  the  approximate 


synchrony  condition  in  Definition  3  is  always  satisfied  if  Nf  —  Ns  <  A.  We  wish  to  find 
the  smallest  number  of  steps  taken  by  the  fastest  process  when  AS  (A)  is  violated.  We 
denote  this  value  as  7Vmin,  and  obtain  it  by  formulating  and  solving  a  linear  program. 

Suppose  first  that  Vs  and  Vf  begin  stepping  at  the  same  time  t.  Then,  since  the  time 
between  steps  of  Vf  is  at  least  a1  and  that  between  steps  of  Vs  is  at  most  cru ,  the  total 
elapsed  must  be  at  least  &i  Nf  and  at  most  cruNs ,  yielding  the  inequality  a1  Nf  <  cru Ns . 

However,  processes  need  not  begin  making  steps  simultaneously.  Since  each  process 
must  make  its  first  step  at  least  cru  seconds  into  its  execution,  the  maximum  initial  offset 
between  processes  is  cru.  The  smallest  value  of  Nf  occurs  when  the  fast  process  starts 
au  time  units  after  the  slowest  one,  yielding  the  inequality: 

alNf  +  ou  <  auNs 


Given  the  above  analysis,  we  can  set  up  the  following  integer  linear  program  (ILP) 
to  solve  for  Nm[n : 

min  Nf  s.t. 

Nf  >  Ns,  Nf  -  Ns  >  A,  alNf  +  au  <  auNs,  Nf,  Ns  >  1 

7Vmin  is  the  optimal  value  of  this  ILP.  In  effect,  it  gives  the  fewest  steps  any  process 
can  take  (smallest  Nf)  to  violate  the  approximate  synchrony  condition  AS  (A). 


Example  4.  For  the  IEEE  1588  protocol,  as  described  in  Sec.  2.2,  the  actual  process  step 
sizes  lie  in  [0.999, 1.001].  Setting  A  =  1,  solving  the  above  ILP  yields  lVmin  =  1502. 

Step  3:  Temporal  Logic  Property.  Once  Amin  is  computed,  we  verify  on  the  discrete 
abstraction  Ma  whether,  from  any  state  satisfying  X  V  logicConv ,  the  model  reaches 
a  state  satisfying  logicConv  in  less  than  7Vmin  steps.  This  also  verifies  that  all  traces  in 
the  BMC  algorithm  satisfy  the  recurrent  logicConv  property  and  the  segments  between 
logicConv  satisfy  AS  (A).  We  perform  this  by  invoking  a  model  checker  to  verify  the 
following  LTL  property,  which  references  the  variables  Nt  recording  the  number  of 
steps  of  process  Vi : 


( X  V  logicConv)  =>  F  [logicConv  A  (/\(0  <  Ni  <  iVmin))]  (2) 

i 


We  show  in  Sec.  5  how  to  implement  the  above  check  without  explicitly  including  the 
Ni  variables  in  the  system  state.  Note  that  it  suffices  to  verify  the  above  property  on 
the  discrete  abstraction  Ma  constrained  by  the  scheduler  p a  because  we  explore  no 
more  than  Amin  steps  of  any  process  and  so  AT  a  is  a  sound  abstraction.  The  overall 
soundness  result  is  formalized  below. 

Theorem  2.  If  the  abstract  model  Ma  satisfies  Property  2,  then  all  traces  of  the  con¬ 
crete  model  Me  are  traces  of  the  model  Ma  (after  untiming)  (Proof  in  10.2) 


In  Sec.  6,  we  report  on  our  experiments  verifying  properties  of  the  BMC  algorithm  by 
model  checking  the  discrete  abstract  model  Ma  as  described  above. 


5  Model  Checking  with  Approximate  Synchrony 


We  implemented  approximate  synchrony  within  ZING  [4],  an  explicit  state  model  checker. 
ZING  performs  a  “constrained”  asynchronous  composition  of  processes,  using  an  ex¬ 
ternal  scheduler  to  guide  the  interleaving.  Approximate  synchrony  is  enforced  by  an 
external  scheduler  that  explores  only  those  traces  satisfying  AS  (A)  by  scheduling,  in 
each  state,  only  those  processes  whose  steps  will  not  violate  AS  (A). 

Section  4  described  an  iterative  approach  to  verify  whether  a  A-abstract  model  of 
a  protocol  is  sound.  The  soundness  proof  depends  on  verifying  Property  2.  A  naive 
approach  for  checking  this  property  would  be  to  include  a  local  variable  Ni  in  each 
process  as  part  of  the  process  state  to  keep  track  of  the  number  of  steps  executed  by 
each  process,  causing  state  space  explosion.  Instead,  we  store  the  Ni  information  cor¬ 
responding  to  each  process  external  to  the  system  state,  as  a  part  of  the  model  checker 
explorer. 

The  algorithm  in  Fig.  5  performs 
systematic  bounded  depth  first  search 
(DFS)  exploration.  To  check  whether 
all  traces  of  length  7Vmin  satisfy  even¬ 
tual  logical  convergence  under  AS  (A) 
constraint,  we  enforce  two  bounds: 
first,  the  final  depth  bound  is  (7Vmin  + 

A)  and  second,  in  each  state  a  process 
is  enabled  only  if  executing  that  process 
does  not  violate  AS  (A).  If  a  state  satis¬ 
fies  logicConv  then  we  terminate  the 
search  along  that  path. 

The  BoundedDFS  function  is  called 
recursively  on  each  successor  state  and 
it  explore  only  those  traces  that  sat¬ 
isfy  AS  (A).  If  the  steps  executed  by 
a  process  is  Nmin  then  the  logicConv 
monitor  is  invoked  to  assert  if  s'  |= 
logicConv  (i.e.  we  have  reached  logi¬ 
cal  convergence  state)  and  if  the  asser¬ 
tion  fails  we  increment  the  value  of  A 
as  described  in  Sec.  4.2.  7Vmin  and  A 
values  are  derived  as  explained  in  Sec.  4.2.  StateTable  is  a  map  from  reachable  state 
to  the  tuple  of  steps  with  which  it  was  last  explored,  steps'  is  the  vector  of  number  of 
steps  executed  by  each  process  and  is  stored  as  a  list  of  integers.  IncElement(i,  t)  incre¬ 
ments  the  ith  element  of  tuple  t  and  returns  the  updated  tuple.  CheckASCond^teps') 
checks  the  following  condition  that  Vsi,  52  G  steps'  |si  —  52 1  <  A. 

As  an  optimization,  to  avoid  re-exploring  a  state  which  may  not  lead  to  new  states, 
we  do  not  re-explore  a  state  if  it  is  revisited  with  steps'  greater  than  what  it  was  last 
visited  with.  The  operator  >pt  does  a  pointwise  comparison  of  the  integer  tuples.  We 
show  in  the  following  section  that  we  are  able  to  obtain  significant  state  space  reduction 
using  this  implementation. 


var  StateTable  :  Dictionary  (State,  List  (int)); 

BoundedDFS(s  :  State )  { 

var  i  :  int ,  s'  :  State ,  steps'  :  List(int ); 
i  :=  0; 

while  (i  <NoOfProcesses(s)){ 

steps'  :=IncElement(z,  StateTable[s ]); 
if  -i  CheckASCond (steps') 

V  steps'  [i]  >  (Nmin  +  A) 

V  s  |=  logicConv  then 
continue  ; 

s'  :=NextState(s,  i); 
if  steps'  [i]  =  Nmin  then 

assert(s'  |=  logicConv)', 
if  s'  ^  Domain(StateTable) 

V  -i( steps'  >pt  StateTable[s'] )  then 
StateTable[s ']  :=  steps'', 

BoundedDFS  (V); 

Verify  ()  { 

St  at  eTable[s  initial]  =  new  List  (int); 

B  oundedDFS  ( s  initial ) ; 


Fig.  5.  Algorithm  for  Verification  of  Property  2 


6  Evaluation 


In  this  section,  we  present  our  empirical  evaluation  of  the  approximate  synchrony  ab¬ 
straction,  guided  by  the  following  goals: 

•  Verify  two  real-world  standards  protocols:  (1)  the  best  master  clock  algorithm  in 
IEEE  1588  and  (2)  the  time  synchronized  channel  hopping  protocol  in  IEEE  802. 15.4e. 

•  Evaluate  if  we  can  verify  properties  that  cannot  be  verified  with  full  asynchrony 
(either  by  reducing  state  space  or  by  capturing  relevant  timing  constraints) 

•  Evaluate  approximate  synchrony  as  an  iterative  bounding  technique  for  finding  bugs 
efficiently  in  almost- synchronous  systems. 


6.1  Modeling  and  Experimental  Setup 

We  model  the  system  in  P  [11],  a  domain- specific  language  for  writing  event-driven 
protocols.  A  protocol  model  in  P  is  a  collection  of  state  machines  interacting  with  each 
other  via  asynchronous  events  or  messages.  The  P  compiler  generates  a  model  for  sys¬ 
tematic  exploration  by  ZlNG  [4].  P  also  provides  ways  of  writing  LTL  properties  as 
monitors  that  are  synchronously  composed  with  the  model.  Both  the  case  studies,  the 
BMC  algorithm  and  the  TSCH  protocol,  are  modeled  using  P.  Each  node  in  the  protocol 
is  modeled  as  a  separate  P  state  machine.  Faults  and  message  losses  in  the  protocol  are 
modeled  as  non-deterministic  choices. 


Protocol 

Temporal  Property 

Description 

BMCA 

F  G  ( logicConv ) 

Eventually  the  BMC  algorithm  stabilizes  with  a  unique  spanning  tree  having 
the  grandmaster  at  its  root.  The  system  is  said  to  be  in  logicConv  state  when 
the  system  has  converged  to  the  expected  spanning  tree. 

TSCH 

A ien  G(-'desynchedi) 

A  node  in  TSCH  is  said  to  be  desynched  -  if  it  fails  to  synchronize  with  its 
master  within  the  threshold  period.  The  desired  property  of  a  correct  system  is 
that  the  nodes  are  always  synchronized. 

Table  1.  Temporal  properties  verified  for  the  case  studies 


All  experiments  were  performed  on  64-bit  Windows  server  with  Intel  Xeon  ES- 
2440,  2.40GHz  (12  cores/24  threads)  and  160  GB  of  memory.  ZlNG  can  exploit  paral¬ 
lelism  as  its  iterative  depth-first  search  algorithm  is  completely  parallelized.  All  timing 
results  reported  in  this  section  are  when  ZlNG  is  run  with  24  threads.  We  use  the  number 
of  states  explored  and  the  time  taken  to  explore  them  as  the  comparison  metric. 

6.2  Verification  and  Testing  using  Approximate  Synchrony 

We  applied  approximate  synchrony  in  three  different  contexts  :  (1)  Time  synchronized 
Channel  Hopping  protocol  ( time  synchronized  system )  (2)  Best  Master  Clock  Algorithm 
in  IEEE  1588  ( exploiting  recurrent  logical  condition)  (3)  Approximate  Synchrony  as  a 
bounding  technique  for  finding  bugs. 

Verification  of  the  TSCH  Protocol .  Time  Synchronized  Channel  Hopping  (TSCH)  is 
a  Medium  Access  Control  scheme  that  enables  low  power  operations  in  wireless  sen¬ 
sor  network  using  time- synchronization.  It  makes  an  assumptions  that  the  clocks  are 
always  time- synchronized  within  a  bound,  referred  to  as  the  ‘guard’  time  in  the  stan¬ 
dard.  The  low  power  operation  of  the  system  depends  on  the  sensor  nodes  being  able 
to  maintain  synchronization  (desynchronization  property  in  Table  1).  A  central  server 


broadcasts  the  global  schedule  that  instructs  each  sensor  node  when  to  perform  op¬ 
erations.  Whether  the  system  satisfies  the  desynchronization  property  depends  on  this 
global  schedule,  and  the  standard  provides  no  recommendation  on  these  schedules. 

We  modeled  the  TSCH  as  a  SAS  system  and  used  Theorem  1  to  calculate  the  value 
of  A5.  We  verified  the  desynchronization  property  (Table  1)  in  the  presence  of  failures 
like  message  loss,  interference  in  wireless  network,  etc.  For  the  experiments  we  con¬ 
sidered  three  schedules  (1)  round-robin:  nodes  are  scheduled  in  a  round  robin  fashion, 
(2)  shared  with  random  back-off:  all  the  schedule  slots  are  shared  and  conflict  is  re¬ 
solved  using  random  back-off  (3)  Priority  Scheduler:  nodes  are  assigned  fixed  priority 
and  conflict  is  resolved  based  on  the  priority. 

We  were  able  to  verify  if  the  property  was  satisfied  for  a  given  topology  under 
the  global  schedule,  and  generated  a  counterexample  otherwise  (Table  2)  which  helped 
the  TSCH  system  developers  in  choosing  the  right  schedules  for  low  power  operation. 
Using  sound  approximate  synchrony  abstraction  (with  A  =  1),  we  could  accurately 
capture  the  “almost  synchronous”  behavior  of  the  the  TSCH  system. 


Verification  of  BMC  Algorithm  j 

Network 

Topology 

(#Nodes) 

Safety  Property 

Convergence  Property 

Fully  Asynchronous 
Model 

Model  with  Approximate 
Synchrony 

Model  with  Approximate 
Synchrony 

States 

Explored 

Time 

(h:mm) 

Property 

Proved 

States 

Explored 

Time 

(h:mm) 

Property 

Proved 

States 

Explored 

Time 

(hh:mm) 

Property 

Proved 

Linear(5) 

1.2  E+9 

7:12 

Yes 

1 

9.5  E+5 

0:35 

Yes 

1 

5.3  E+8 

6:33 

Yes 

Star(5) 

2.4  E+10 

9:40 

Yes 

1 

5.8  E+5 

0:54 

Yes 

1 

4.1  E+7 

5:10 

Yes 

Random(5) 

9.19  E+9 

9:01 

Yes 

2 

5.5  E+6 

1:44 

Yes 

2 

1.8  E+9 

9:10 

Yes 

Ring(5) 

7.1  E+12* 

* 

No 

1 

4.8  E+7 

3:44 

Yes 

1 

8  E+9 

8:04 

Yes 

Linear(7) 

1.4  E+13* 

* 

No 

1 

4.6  E+7 

3:05 

Yes 

1 

1.0  E+8 

6:21 

Yes 

Star(7) 

1.1  E+13* 

* 

No 

2 

3.7  E+8 

5:06 

Yes 

2 

3.3  E+10 

13:34 

Yes 

Ring(7) 

3.3  E+12* 

* 

No 

2 

6.8  E+8 

8:04 

Yes 

2 

2.1  E+10 

11:11 

Yes 

Random(6) 

1.1  E+13* 

* 

No 

3 

5.7  E+9 

6:00 

Yes 

3 

1.3  E+10 

10:34 

Yes 

Random(7) 

1.1  E+13* 

* 

No 

3 

8.1  E+8 

7:11 

Yes 

3 

9.9  E+10 

10:11 

Yes 

Verification  ol 

TSCH  Protocol  1 

Network 

Round-Robin  Scheduler 

Shared  with  CSMA 

Priority  Scheduler 

Topology 

(#Nodes) 

States 

Explored 

Time 

(h:mm) 

Property 

Satisfied 

States 

Explored 

Time 

(h:mm) 

Property 

Satisfied 

States 

Explored 

Time 

(h:mm) 

Property 

Satisfied 

Linear(5) 

4.4  E+4 

0:20 

Yes 

1.2E+2# 

0:03 

No 

2.4E  +3# 

0:09 

No 

Random(5) 

3.6  E+2# 

0:05 

No 

6.2E+3# 

0:12 

No 

1.9E  +6 

0:35 

Yes 

Mesh(5) 

1.7  E+7 

4:05 

Yes 

9.1  E+6 

2:01 

Yes 

9.3  E+5 

0:31 

Yes 

*  denotes  end  of  exploration  as  model  checker  ran  out  of  memory,  #  denotes  property  violated  and  counter  example  is  reported 

Table  2.  Verification  results  using  Approximate  Synchrony. 


Verification  of  BMC  Algorithm.  The  BMC  algorithm  is  a  core  component  of  the  IEEE 
1588  precision  time  protocol.  It  is  a  distributed  fault  tolerant  protocol  where  nodes  in 
the  system  perform  operations  periodically  to  converge  on  a  unique  hierarchical  tree 
structure,  referred  to  as  the  logical  convergence  state  in  Sec.  4.  Note  that  the  conver¬ 
gence  property  for  BMCA  holds  only  in  the  presence  of  almost  synchrony  —  it  does  not 
guarantee  convergence  in  the  presence  of  unbounded  process  delay  or  message  delay. 
Hence,  it  is  essential  to  verify  BMC  using  the  right  form  of  synchrony. 

We  generated  various  verification  instances  by  changing  the  configuration  param¬ 
eters  such  as  number  of  nodes,  clock  characteristics,  and  the  network  topology.  The 
results  in  Table  2  for  the  BMC  algorithm  are  for  5  and  7  nodes  in  the  network  with  lin¬ 
ear,  star,  ring,  and  random  topologies.  The  A  value  used  for  verification  of  each  of  these 

5  For  system  of  nodes  under  consideration,  the  maximum  clock  skew,  e  =  120 ps  and  nominal 
step  size  of  100ms,  the  value  of  A  —  1 


configurations  was  derived  by  using  the  iterative  approach  described  in  Sec.  4.2.  The  re¬ 
sults  demonstrate  that  the  value  of  A  required  to  construct  the  sound  abstraction  varies 
depending  on  network  topology,  and  clock  dynamics.  Table  2  shows  the  total  number 
of  states  explored  and  time  taken  by  the  model  checker  for  proving  the  safety  and  con¬ 
vergence  property  (Table  1)  using  the  sound  Z\-abstract  model.  Approximate  synchrony 
abstraction  is  orders  of  magnitude  faster  as  it  explores  the  reduced  state-space.  BMCA 
algorithm  satisfies  safety  invariant  even  in  the  presence  of  complete  asynchrony.  For 
demonstrating  the  efficiency  of  using  approximate  synchrony  we  also  conducted  the 
experiments  with  complete  asynchronous  composition,  exploring  all  possible  interleav¬ 
ing  (for  safety  properties).  The  complete  asynchronous  model  is  simple  to  implement 
but  fails  to  prove  the  properties  for  most  of  the  topologies. 

An  upshot  of  our  approach  is  that  we  are  the  first  to  prove  that  the  BMC  algorithm 
in  IEEE  1588  achieves  logical  convergence  to  a  unique  stable  state  for  some  interesting 
configurations.  This  was  possible  because  of  the  sound  and  tunable  approximate  syn¬ 
chrony  abstraction.  Although  experiments  with  5/7  nodes  may  seem  small,  networks  of 
this  size  do  occur  in  practice,  e.g.,  in  industrial  automation  where  one  has  small  teams 
of  networked  robots  on  a  factory  floor. 

Endlessly  circulating  (rogue)  frames  in  IEEE  1588:  The  possibility  of  an  endlessly 
circulating  frame  in  a  1588  network  has  been  debated  for  a  while  in  the  standards  com¬ 
mittee.  Using  formal  model  of  BMC  algorithm  under  approximate  synchrony,  we  were 
able  to  reproduce  a  scenario  were  rogue  frame  could  occur.  Existence  of  a  rogue  frame 
can  lead  to  network  congestion  or  cause  the  BMC  algorithm  to  never  converge.  The 
counter  example  was  cross-validated  using  simulation  and  is  described  in  detail  in  [6] . 
It  was  well  received  by  the  IEEE  1588  standards  committee. 


Buggy 

Models 

Iterative  Depth  Bounding 
with  Random  Search 

Non-Iterative  AS 

Iterative  AS 

Depth 

States 

Explored 

Time 

(h:mm) 

States 

Explored 

Time 

(h:mm) 

A 

States 

Explored 

Time 

(h:mm) 

BMCA_Bug_l 

51 

1.4  E+3 

0:05 

2 

1.1  E+3 

0:04 

0 

2.1  E+2 

0:02 

BMCA_Bug_2 

64 

5.9  E+5 

0:15 

2 

6.1  E+4 

0:14 

0 

1.6  E+3 

0:04 

BMCA_Bug_3 

101 

9.4  E+7 

0:45 

3 

3.3  E+5 

0:17 

1 

9.1  E+2 

0:05 

ROGUE  _FRAME_Bug_l 

44 

3.9  E+5 

0:18 

2 

9.7  E+6 

0:29 

1 

5.6  E+4 

0:12 

ROGUEJFRAMEJBug_2 

87 

4.4  E+4 

0:09 

2 

2.1  E+3 

0:05 

1 

1.1  E+3 

0:03 

SPT_Bug_l 

121 

8.4  E+8 

1:05 

3 

8.1  E+4 

0:11 

0 

5.5  E+2 

0:04 

Table  3.  Iterative  Approximate  Synchrony  with  bound  A  for  finding  bugs  faster. 
Approximate  Synchrony  as  a  Search  Prioritization  Technique.  Another  interesting 
application  of  approximate  synchrony  is  as  a  bounding  technique  to  prioritize  search. 
We  collected  buggy  models  during  the  process  of  modeling  the  BMC  algorithm  and 
used  them  as  benchmarks,  along  with  buggy  instance  of  the  Perlman’s  Spanning  Tree 
Protocol  [23]  (SPT).  We  used  AS  as  an  iterative  bounding  technique,  starting  with  A  = 
0  and  incrementing  A  after  each  iteration.  For  A  =  0,  the  model  checker  explores 
only  synchronous  system  behaviors.  Increasing  the  value  could  be  considered  as  adding 
bounded  asynchronous  behaviors  incrementally.  Table  3  shows  comparison  between 
iterative  AS,  non-iterative  AS  with  fixed  value  of  A  taken  from  Table  2  and  iterative 
depth  bounding  with  random  search.  Number  of  states  explored  and  the  corresponding 
time  taken  for  finding  the  bug  is  used  as  the  comparison  metric.  Results  demonstrate 
that  most  of  the  bugs  are  found  at  small  values  of  A  (hence  iterative  search  is  beneficial 
for  finding  bugs).  Some  bugs  like  the  rogue  frame  error,  that  occur  only  when  there 
is  asynchrony  were  found  with  minimal  asynchrony  in  the  system  (A  =  1).  These 
results  confirm  that  prioritizing  search  based  on  approximate  synchrony  is  beneficial 
in  finding  bugs.  Other  bounding  techniques  such  as  delay  bounding  [14]  and  context 


bounding  [22]  can  be  combined  with  approximate  synchrony  but  this  is  left  for  future 
work. 


7  Related  Work 

The  concept  of  partial  synchrony  has  been  well-studied  in  the  theory  of  distributed 
systems  [13,12,24].  There  are  many  ways  to  model  partial  synchrony  depending  on  the 
type  of  system  and  the  end  goal  (e.g.,  formal  verification).  Approximate  synchrony  is 
one  such  approach,  which  we  contrast  against  the  most  closely-related  work  below. 
Hybrid/Timed  Modeling :  The  choice  of  modeling  formalism  greatly  influences  the  ver¬ 
ification  approach.  A  time-synchronized  system  can  be  modeled  as  a  hybrid  system  [2]. 
However,  it  is  important  to  note  that,  unlike  traditional  hybrid  systems  examples  from 
the  domain  of  control,  the  discrete  part  of  the  state  space  for  these  protocols  is  very 
large.  Due  to  this  we  observed  that  leading  hybrid  systems  verification  tools,  such  as 
SpaceEx  [16],  cannot  explore  the  entire  state  space. 

There  has  been  work  on  modeling  timed  protocols  using  real-time  formalisms  such 
as  timed  automata  [3],  where  the  derivatives  of  all  continuous-time  variables  are  equal 
to  one.  While  tools  based  on  the  theory  of  timed  automata  do  not  explicitly  support 
modeling  and  verification  of  multi-rate  timed  systems  [20],  there  do  exist  techniques  for 
approximating  multirate  clocks.  For  instance,  Huang  et  al.  [18]  propose  the  use  of  inte¬ 
ger  clocks  on  top  of  UPPAAL  models.  Daws  and  Yovine  [9]  show  how  multirate  timed 
systems  can  be  over-approximated  into  timed  automata.  Vaandrager  and  Groot  [28] 
models  a  clock  that  can  proceed  with  different  rate  by  defining  a  clock  model  consist¬ 
ing  of  one  location  and  one  self  transition.  Such  models  only  approximately  represent 
multirate  time  systems.  By  contrast,  our  approach  algorithmically  constructs  abstrac¬ 
tions  that  can  be  refined  to  be  more  precise  by  tuning  the  value  of  A ,  and  results  in  an 
sound  untimed  model  that  can  be  directly  checked  by  a  finite- state  model  checker. 
Synchrony  and  Asynchrony :  There  have  been  numerous  efforts  devoted  towards  mix¬ 
ing  synchronous  and  asynchronous  modeling.  Multiclock  Esterel  [25]  and  communicat¬ 
ing  reactive  processes  (CRP)  [5]  extend  the  synchronous  language  Esterel  to  support  a 
mix  of  synchronous  and  asynchronous  processes.  Bounded  asynchrony  is  another  such 
modeling  technique  with  applications  to  biological  systems  [15].  It  can  be  used  to  model 
systems  in  which  processes  can  have  different  but  constant  rates,  and  can  be  interleaved 
asynchronously  (with  possible  stuttering)  before  they  all  synchronize  at  the  end  of  a 
global  “period.”  Approximate  synchrony  has  no  such  synchronizing  global  period.  The 
quasi- synchronous  (QS)  [7,17]  approach  is  designed  for  communicating  processes  that 
are  periodic  and  have  almost  the  same  period.  QS  [17]  is  defined  as  “Between  any  two 
successive  activations  of  one  period  process,  the  process  on  any  other  process  is  acti¬ 
vated  either  0,  1,  or  at  most  2  times”.  As  a  consequence,  in  both  quasi- synchrony  and 
bounded  asynchrony,  the  difference  of  the  absolute  number  of  activations  of  two  differ¬ 
ent  processes  can  grow  unboundedly.  In  contrast,  the  definition  of  AS  does  not  allow 
this  difference  to  grow  unbounded. 


8  Conclusion 


This  paper  has  introduced  two  new  concepts:  a  class  of  distributed  systems  termed  as 
symmetric,  almost- synchronous  (SAS)  systems,  and  approximate  synchrony ,  an  abstrac¬ 
tion  method  for  such  systems.  We  evaluated  applicability  of  approximate  synchrony  for 


verification  in  two  different  contexts:  (i)  application-layer  protocols  running  on  top  of 
time- synchronized  systems  (TSCH),  and  (ii)  systems  that  do  not  rely  on  time  synchro¬ 
nization  but  exhibit  recurrent  logical  behavior  (BMC  algorithm).  We  also  described  an 
interesting  search  prioritization  technique  based  on  approximate  synchrony  with  the 
key  insight  that,  prioritizing  synchronous  behaviors  can  help  in  finding  bugs  faster. 

In  this  paper,  we  focus  on  verifying  protocols  that  fit  the  SAS  formalism  defined  in 
Sec.  2.2.  While  other  protocols  whose  behavior  and  correctness  relies  on  using  values  of 
timestamps  do  not  natively  fit  into  the  SAS  formalism,  they  can  be  abstracted  using  the 
suitable  methods  (e.g.,  using  a  state  variable  to  model  a  local  timer  for  a  process  whose 
value  is  incremented  on  each  step  of  that  process  —  with  approximate  synchrony  the 
timer  values  across  different  processes  will  not  differ  by  more  than  A).  Evaluating  such 
abstractions  for  protocols  like  Google  Spanner  and  others  that  use  timestamps  would 
be  an  interesting  next  step. 
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10  Appendix 

10.1  Linear  Temporal  Logic 

Given  a  finite  set  of  atomic  propositions  E,  formulas  in  linear  temporal  logic  (LTL)  are 
constructed  as  per  the  following  grammar: 

ip  ::=  p  |  -tip  |  ip  V  ip  |  X.ip  \  ipXJip 

where  p  E  E  is  an  atomic  proposition,  X  is  the  temporal  operator  next  and  U  is  the 
temporal  operator  until.  Other  temporal  operators  can  be  derived  using  these  two  tem¬ 
poral  operators  and  Boolean  operators,  for  example,  “eventually  ip”  as  F ip  =  trueU ip 
and  “globally  ip”  as  Gip  =  — <F— < ip. 


10.2  Proofs  of  Theorems 


Proof  of  Theorem  1: 


Proof.  Consider  two  arbitrary  processes  Vi  and  Vj .  We  show  that  it  is  always  the  case 
that  \Ni  —  Nj\  <  \jr~\- 

Consider  an  arbitrary  time  point  t  according  to  an  ideal  time  reference.  Without  loss 
of  generality,  assume  Ni(t)  >  Nj(t )  (i.e.,  that  Vi  has  made  more  steps  than  Vj)  and 
that  Vj  has  performed  a  step  at  time  t.  We  seek  to  bound  the  number  of  additional  steps 
that  Vi  has  made  over  Vj . 

By  the  “Common  Timetable”  assumption,  Vt  and  Vj  step  at  the  same  values  of 
their  respective  clocks.  Therefore,  it  must  be  the  case  that  Xi  >  Xj •  Further,  due  to  time 
synchronization,  we  also  have  Xi  ~  Xj  <  P-  Also,  the  step  size  of  Vi  is  bounded  below 
by  a1.  Thus,  the  number  of  additional  steps  Vi  could  have  taken  at  time  t  over  Vj  is 
bounded  above  by 


r 


Xi  -  Xj 


Thus,  \Ni  —  Nj |  <  |~^~|  at  time  t,  for  any  t.  This  yield  the  desired  value  of  A. 


Proof  of  Theorem  2: 


Proof  From  the  computation  of  7Vmin  we  know  that  if,  in  any  trace  segment,  no  process 
makes  7Vmin  or  more  steps,  then  that  trace  segment  satisfies  AS  (A).  In  particular,  this 
applies  to  every  trace  of  the  concrete  model  Me- 

Since  Ma  satisfies  Property  2,  every  segment  of  a  trace  of  Ma  starting  in  a  state 
satisfying  X  V  logicConv  must  reach  another  state  in  logicConv  before  any  process 
makes  Nm[n  steps.  In  other  words,  every  trace  of  Ma  has  the  form 


^0  5  <$1  •>  ^2 1  •  •  •  •>  Si\  i  •  •  •  i  $i2  i  •  •  •  ,  $23  5  •  •  • 

where  so  G  X  and  Sij  G  logicConv  for  all  j,  and  furthermore,  during  the  trace  segments 
between  states  so?  sh  >  si2  etc^  no  process  makes  7Vmin  or  more  steps. 

We  now  argue  that  this  type  of  recurrent  behavior  is  also  present  in  traces  of  Me- 
Let  us  hypothesize  that,  to  the  contrary,  there  is  a  trace  of  Me  with  a  prefix  of  the  form 
(s0,Xo,t0),{s1,xi,ti),{s2,X2,t2),--  -,(sk,Xk,tk)  Where  So  €  1,  st  logicConv 
for  any  i,  and  some  process  makes  its  iVminth  step  with  the  transition  into  S&.  Note  that 
the  untimed  prefix  s0?  si,  ^2?  •  •  •  ?  Sk-i  is  a  valid  prefix  of  some  trace  of  Ma,  since  no 
process  has  made  7Vmin  or  more  steps,  and  hence  AS  (A)  holds.  However,  we  know  that 
Ma  satisfies  Property  2,  which  implies  that  some  state  s^i  =  0, 1, . . . ,  fc  —  1  must  be 
in  logicConv.  This  contradicts  our  hypothesis,  and  implies  that  all  traces  of  Me  must 
visit  a  state  in  logicConv  infinitely  often  with  no  process  making  A^min  or  more  steps 
between  visits.  By  construction  of  Ma,  the  untiming  of  each  of  these  traces  is  a  trace 
of  Ma,  from  which  the  theorem  follows. 


10.3  Implementation  of  AS  as  scheduler 

Section  5  gave  an  overview  of  how  we  implemented  AS  as  an  external  scheduler  in 
ZlNG.  The  processes  to  be  executed  in  the  current  state  under  AS  (A)  is  controlled 
externally  and  we  do  not  add  any  more  states  to  the  existing  state  space  of  the  model. 


We  used  explicit  state  model  checker  with  state  caching  such  that  if  a  state  is  already 
explored  then  it  is  not  re-explored  when  visited  again.  Consider  two  cases,  in  the  first 
case  scheduler  state  is  a  part  of  the  system  state  (scheduler  is  modeled  as  a  separate 
process  and  composed  with  other  processes  in  the  system).  Hence,  state  caching  based 
search  is  sound  and  will  not  miss  any  states.  In  our  case  since  the  scheduler  state  is 
not  part  of  the  system  state,  we  can  miss  soundness  because  we  might  visit  the  same 
program  state  with  different  scheduler  state  (which  can  mean  that  different  out-going 
transitions  may  be  enabled  which  were  not  enabled  the  last  time)  and  hence  the  state 
should  be  re-explored  with  the  new  scheduler  state.  But  because  of  state-caching  only 
the  program  state,  the  explorer  assumes  that  all  possible  transition  from  this  state  are 
explored  and  hence  we  don’t  re-explore  it  and  can  miss  reachable  state. 

The  fix  for  this  is  that  we  maintain  minimal  information  as  a  part  of  the  system  state 
that  distinguishes  the  program  state  when  it  is  visited  with  different  scheduler  state.  The 
complex  logic  of  evaluating  which  process  to  execute  next  and  enforcing  AS  condition 
is  still  in  the  external  scheduler.  We  did  not  add  new  scheduler  process  in  the  system 
that  counts  the  number  of  steps  executed  by  each  process  which  does  save  a  lot  of  states. 


10.4  Additional  Information  about  Case  Studies 

In  this  section,  we  provide  an  overview  of  two  motivating  case  studies.  The  first  case 
study  concerns  verification  of  the  best  master  clock  algorithm  in  the  IEEE  1588  pre¬ 
cision  timed  protocol  [19],  where  clocks  are  not  (initially)  synchronized,  but  the  drift 
of  clocks  are  bounded.  This  protocol  is  representative  of  a  class  we  term  a  posteriori 
time- synchronized,  since  it  forms  the  first  phase  of  a  time  synchronization  protocol.  The 
second  case  study  concerns  time- synchronized  channel  hopping  (TSCH)  that  is  part  of 
the  IEEE802.15e  protocol  [1].  This  latter  case  study  shows  an  example  where  the  cor¬ 
rectness  properties  are  proven  for  an  a  priori  time- synchronized  system. 


IEEE  1588  Precision  Time  Protocol  The  IEEE  1588  standard  [19],  also  known  as  the 
precision  timed  protocol  ( PTP ),  is  a  distributed  protocol  that  enables  precise  synchro¬ 
nization  of  clocks  over  a  communication  network.  The  protocol  consists  of  two  parts: 
the  best  master  clock  (BMC)  algorithm  and  a  time  synchronization  phase.  The  BMC 
algorithm  is  a  distributed  algorithm  and  its  purpose  is  twofold:  (i)  to  elect  one  grand¬ 
master  clock  that  is  the  best  clock  in  the  network,  and  (ii)  to  find  a  unique  spanning  tree 
in  a  network,  where  the  grandmaster  clock  is  the  root  of  the  tree.  Thus,  the  goal  of  the 
BMC  algorithm  can  be  characterized  as  convergence  to  a  particular  stable  configura¬ 
tion ,  comprising  agreement  on  network  topology  and  leader  (grandmaster  clock).  The 
time  synchronization  phase  uses  the  spanning  tree  to  synchronize  the  time  of  all  clocks 
in  the  network  against  the  grandmaster  clock.  In  this  case  study,  we  are  focusing  on  the 
correctness  of  the  BMC  algorithm,  not  the  time  synchronization  phase. 

The  BMC  algorithm  is  distributed,  meaning  that  there  is  no  central  node  that  coor¬ 
dinates  the  execution  of  the  algorithm.  Consider  Fig.  6(a)  that  depicts  four  devices  with 
separate  clocks  C\,  C2,  C3,  and  C4,  that  are  connected  using  three  networks  n\,  n 2, 
713.  Fig.  6(b)  depicts  the  final  result  after  executing  BMC.  A  tree  is  formed  where  C\ 
is  the  root  (the  grandmaster).  The  parent/child  relationships  are  defined  using  the  states 
of  the  ports:  master  (M)  and  slave  (S)  indicate  parent  and  child,  respectively.  Note  also 
that  the  cycle  between  C2,  C3,  and  C4  is  broken  by  disabling  the  link  between  C2  and 
C4,  by  specifying  one  of  the  ports  as  passive  (P). 
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Fig.  7.  The  figure  shows  periodic  state  decision  events  a  for  clock  C\  and  announce  messages 
d2  and  a 3  received  from  clocks  C2  and  C3,  respectively. 


Each  port  in  the  network  operates  logically  as  a  state  machine,  determining  (some¬ 
what  simplified)  if  it  is  a  master  port,  a  slave  port,  or  a  passive  port.  During  execution  of 
the  BMC  algorithm,  each  port  executes  periodically  at  state  decision  events  to  exchange 
messages,  where  the  (slightly  varying)  period  is  termed  the  announce  interval.  These 
events  are  fired  by  timers  defined  by  each  individual  local  clock.  Because  all  clocks 
can  start  at  different  states  and  be  drifting  away  from  each  other,  there  is  no  guaran¬ 
tee  that  the  clocks  will  be  synchronized.  The  only  assumption  that  can  be  made  is  that 
the  clock  drift  is  bounded.  Such  a  bound  is  specified  by  the  IEEE  1588  standard.  Con¬ 
sider  Fig.  7  that  shows  an  example  where  state  decision  events  e\  at  clock  C\  are  fired 
periodically  and  announce  messages  a 2  and  as  are  received  from  clocks  C2  and  C3, 


respectively.  Announce  messages  are  used  by  the  BMC  algorithm  to  inform  the  clocks 
in  the  network  about  clock  characteristics  and  to  communicate  the  current  best  clock;  it 
is  the  main  mechanism  used  for  forming  the  spanning  tree  and  electing  the  grandmaster 
clock. 

There  are  several  sources  of  non-determinism  during  the  BMC  phase.  Firstly,  note 
for  instance  that  in  Fig.  7  the  state  decision  events  e\  occurs  with  a  period  of  2  seconds, 
but  are  drifting  slightly  for  every  event.  The  rate  of  the  drift  is  bounded,  but  the  clock 
skew  (the  difference  of  time  between  two  clocks)  may  increase  over  time.  Secondly,  the 
length  of  an  announce  interval  can  vary  within  a  tolerance  of  ±30%  (see  section  9.5.8 
in  [19]).  Note  for  instance  how  announce  messages  a 2  and  <23  appears  at  different  times, 
and  how  the  jitter  caused  by  sending  these  messages  (e.g.,  because  of  internal  queues 
and  protocol  stacks)  can  result  in  variation  of  the  number  of  messages  received  between 
two  consecutive  events;  a<±  appears  once  between  the  first  two  events,  but  twice  between 
the  second  two  events. 

The  challenge  we  consider  in  this  case  study  is  to  verify  the  correctness  of  a  central 
aspect  of  the  BMC  algorithm:  for  a  specific  topology,  we  verify  that  the  BMC  algorithm 
converges  to  one  specific  grandmaster  clock.  The  non-determinism  of  when  announce 
messages  are  received  and  when  periodic  events  occur  make  the  model  checking  prob¬ 
lem  particularly  challenging.  In  this  paper,  we  address  the  problem  of  how  to  model 
such  non-determinism,  by  providing  an  analytic  solution  that  abstract  away  the  real¬ 
time  aspect  of  the  BMC  algorithm  and  transform  the  model  checking  problem  into  an 
untimed  model.  In  this  case  we  see  that  events  and  announce  messages  are  “almost  syn¬ 
chronous”,  where  non-determinism  is  introduced  by  bounded  clock  rates,  jitter  when 
sending  messages,  and  by  unknown  initial  clock  states. 


Time-Synchronized  Channel  Hopping  The  time-synchronized  channel  hopping  (TSCH)  [1] 
protocol  is  being  adopted  as  a  part  of  the  low  power  Medium  Access  Control  (MAC) 
standard  IEEE802.15.4e.  It  has  a  time-slotted  architecture  and  time-slots  are  grouped 
into  scheduled-super-frame  which  repeats  over  time.  A  global  schedule  instructs  each 
node  on  what  time-slot  to  transmit/receive  data  to/from  which  node.  The  TSCH  pro¬ 
tocol  makes  the  strong  assumption  that  the  nodes  in  the  system  are  time- synchronized 
within  a  bound  called  the  ‘guard’  time.  Hence,  nodes  can  wake  up  just  before  start  of 
the  time- slot  allotted  by  the  schedule  and  remain  in  sleep  mode  otherwise.  In  the  ab¬ 
sence  of  precise  time-synchronization,  the  time-slots  across  nodes  would  not  be  aligned 
within  the  guard  bound  and  hence  nodes  will  fail  to  communicate  successfully  during 
the  allotted  slot. 

Nodes  keep  track  of  time-slots  using  timers  maintained  by  local  clocks.  Over  a  du¬ 
ration  of  time  because  of  the  drift  in  clocks,  nodes  may  get  desynchronized.  A  central 
server  computes  a  global  schedule  to  ensure  that  nodes  always  synchronize  at  least  once 
within  the  threshold  period  after  which  they  would  be  desynchronized.  Nodes  synchro¬ 
nize  on  receiving  messages  from  the  master  node,  hence  successful  communication 
with  the  master  node  periodically  is  essential  and  should  be  ensured  by  the  schedule. 

The  TSCH  standard  provides  no  recommendation  on  building  the  schedule.  It  is  the 
responsibility  of  the  central  server  to  compute  the  right  schedule  given  the  worst-case 
clock  drift  and  the  environmental  assumptions.  Over- synchronization  by  communicat¬ 
ing  more  frequently  than  required  may  keep  all  nodes  synchronized,  but  is  not  desirable 
because  of  power  constraints.  The  challenge  is  to  verify  the  reliability  property  that 
given  a  network  deployment,  worst-case  drift,  lossy  channels,  and  a  global  schedule 
can  all  nodes  in  the  system  be  always  synchronized.  The  assumption  is  that  the  nodes 


are  time- synchronized  and  the  property  to  check  is  that  the  protocol  extended  with  the 
schedule  ensures  that  the  nodes  remain  synchronized. 


10.5  Parameters  for  Experiments 

BMC  Algorithm  Using  the  set  of  Equations  for  7Vmin,  and  the  values  of  e  =  10-3  we 
get  for : 

•  A  =  1  Nmin  =  1001 

•  A  =  2  Nmin  =  2002 


TSCH  In  TSCH  network,  all  the  nodes  are  assumed  to  start  communicating  at  the 
start  of  the  time- slot.  To  tolerate  some  desynchronization  the  receivers  start  listening 
a  small  time  duration  before  the  start  of  time-slot  and  keeps  listening  sometime  after. 
This  duration  is  called  the  ‘guard’  time  ( Tg ).  Typical  Tg  value  is  lms.  Consider  the 
system  being  equipped  with  60ppm  crystals  then  two  nodes  can  drift  by  120 /is.  The 
synchronization  period  is  rsp  and  is  calculated  using  the  equation  3.  Which  means  that 
the  clocks  desynchronize  8s  after  it  last  communicated.  For  safety  we  consider  that 
the  nodes  should  communicate  every  3s.  If  a  step  in  the  model  corresponds  to  1  time- 
slot  and  the  time-slot  size  if  100ms  then  the  number  of  steps  between  two  periodic 
resynchronization  is  Nperi0d  =  30 


- 

sp  drift 


(3) 


Schedulers.  The  round  robin  scheduler  cycles  over  all  the  nodes  in  the  network  period¬ 
ically.  Shared  with  CSMA  have  only  shared  slots  in  them  and  uses  CSMA  protocol  to 
resolve  conflict.  Priority  scheduler  uses  a  predefined  priority  to  determine  which  nodes 
in  the  system  should  be  scheduled  next. 


